-
Notifications
You must be signed in to change notification settings - Fork 14.5k
[DAG] SelectionDAG::canCreateUndefOrPoison - Mark AVGFLOORS and AVGCEILS as safe #148191
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@llvm/pr-subscribers-llvm-selectiondag Author: None (aabhinavg1) Changes…ILS as safe This patch updates This is formally verified using Alive2:
These patterns are safe due to the use of Includes test coverage to ensure correctness. Differential Revision: <fill this after uploading to Phabricator or GH> Full diff: https://github.com/llvm/llvm-project/pull/148191.diff 2 Files Affected:
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
index c1356239ad206..78f00809e3862 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
@@ -5542,6 +5542,10 @@ bool SelectionDAG::canCreateUndefOrPoison(SDValue Op, const APInt &DemandedElts,
case ISD::UADDSAT:
case ISD::SSUBSAT:
case ISD::USUBSAT:
+ case ISD::AVGFLOORS:
+ case ISD::AVGFLOORU:
+ case ISD::AVGCEILS:
+ case ISD::AVGCEILU:
case ISD::MULHU:
case ISD::MULHS:
case ISD::SMIN:
diff --git a/llvm/test/CodeGen/AArch64/avgfloor-u8.ll b/llvm/test/CodeGen/AArch64/avgfloor-u8.ll
new file mode 100644
index 0000000000000..669f6bbad14a3
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/avgfloor-u8.ll
@@ -0,0 +1,16 @@
+; RUN: llc < %s -march=arm64 -mcpu=apple-m1 | FileCheck %s
+
+; CHECK-LABEL: avg:
+; CHECK: add
+; CHECK: lsr
+; CHECK: ret
+
+define zeroext i8 @avg(i8 noundef zeroext %a, i8 noundef zeroext %b) {
+entry:
+ %conv = zext i8 %a to i16
+ %conv1 = zext i8 %b to i16
+ %add = add nuw nsw i16 %conv1, %conv
+ %div3 = lshr i16 %add, 1
+ %conv2 = trunc nuw i16 %div3 to i8
+ ret i8 %conv2
+}
|
@llvm/pr-subscribers-backend-aarch64 Author: None (aabhinavg1) Changes…ILS as safe This patch updates This is formally verified using Alive2:
These patterns are safe due to the use of Includes test coverage to ensure correctness. Differential Revision: <fill this after uploading to Phabricator or GH> Full diff: https://github.com/llvm/llvm-project/pull/148191.diff 2 Files Affected:
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
index c1356239ad206..78f00809e3862 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
@@ -5542,6 +5542,10 @@ bool SelectionDAG::canCreateUndefOrPoison(SDValue Op, const APInt &DemandedElts,
case ISD::UADDSAT:
case ISD::SSUBSAT:
case ISD::USUBSAT:
+ case ISD::AVGFLOORS:
+ case ISD::AVGFLOORU:
+ case ISD::AVGCEILS:
+ case ISD::AVGCEILU:
case ISD::MULHU:
case ISD::MULHS:
case ISD::SMIN:
diff --git a/llvm/test/CodeGen/AArch64/avgfloor-u8.ll b/llvm/test/CodeGen/AArch64/avgfloor-u8.ll
new file mode 100644
index 0000000000000..669f6bbad14a3
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/avgfloor-u8.ll
@@ -0,0 +1,16 @@
+; RUN: llc < %s -march=arm64 -mcpu=apple-m1 | FileCheck %s
+
+; CHECK-LABEL: avg:
+; CHECK: add
+; CHECK: lsr
+; CHECK: ret
+
+define zeroext i8 @avg(i8 noundef zeroext %a, i8 noundef zeroext %b) {
+entry:
+ %conv = zext i8 %a to i16
+ %conv1 = zext i8 %b to i16
+ %add = add nuw nsw i16 %conv1, %conv
+ %div3 = lshr i16 %add, 1
+ %conv2 = trunc nuw i16 %div3 to i8
+ ret i8 %conv2
+}
|
Code change looks fine but I don't understand what the test you added actually shows.
Don't add this! We haven't used this since we moved away from Phabricator for code reviews. |
Also, I don't understand how your alive2 links are relevant. They verify an expansion of avgfloors/avgceils, but what does this have to do with introducing poison? |
Hi @jayfoad I included the Alive2 links to show that the lowering of AVGFLOORS and AVGCEILS avoids poison, but I get that it doesn't directly prove that the SelectionDAG op itself is safe. If you don’t mind, could you suggest what kind of test cases would better show that these ops don’t introduce poison or undef? |
You need to remove the noundef and compare the effect of freezing the result vs the inputs:
(if you remove either of the dst freezes it should fail) |
case ISD::AVGFLOORS: | ||
case ISD::AVGFLOORU: | ||
case ISD::AVGCEILS: | ||
case ISD::AVGCEILU: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you need test coverage for all 4 patterns
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HI @RKSimon
Thanks for the feedback!
Alive2 Proofs for AVG Operations
Opcode | Operation Type | Alive2 Proof Link |
---|---|---|
AVGFLOORS |
Signed Floor Avg | Alive2 Link |
AVGCEILS |
Signed Ceil Avg | Alive2 Link |
AVGFLOORU |
Unsigned Floor Avg | Alive2 Link |
AVGCEILU |
Unsigned Ceil Avg | Alive2 Link |
Let me know if need to add anything else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These proofs are weird: the whole point of the and/xor/ashr/add implementation is that you don't need to extend to a wider type, you can do all operations in i8.
Could you add a tests to llvm/test/CodeGen/AArch64/hadd-combine.ll that use the aarch64.neon.uhadd/urhadd intrinsics? They should be converted to these avg nodes so can be tested directly. |
Hi @davemgreen, Thanks for the suggestion! I've drafted the following test cases in llvm/test/CodeGen/AArch64/hadd-combine.ll to exercise the aarch64.neon.uhadd and aarch64.neon.urhadd intrinsics: ; RUN: llc < %s -mtriple=aarch64-none-linux-gnu -mattr=+neon | FileCheck %s
declare <8 x i8> @llvm.aarch64.neon.uhadd.v8i8(<8 x i8>, <8 x i8>)
declare <8 x i8> @llvm.aarch64.neon.urhadd.v8i8(<8 x i8>, <8 x i8>)
declare <16 x i8> @llvm.aarch64.neon.uhadd.v16i8(<16 x i8>, <16 x i8>)
declare <16 x i8> @llvm.aarch64.neon.urhadd.v16i8(<16 x i8>, <16 x i8>)
; CHECK-LABEL: test_uhadd_v8i8:
; CHECK: uhadd v0.8b, v0.8b, v1.8b
define <8 x i8> @test_uhadd_v8i8(<8 x i8> %a, <8 x i8> %b) {
%res = call <8 x i8> @llvm.aarch64.neon.uhadd.v8i8(<8 x i8> %a, <8 x i8> %b)
ret <8 x i8> %res
}
; CHECK-LABEL: test_urhadd_v8i8:
; CHECK: urhadd v0.8b, v0.8b, v1.8b
define <8 x i8> @test_urhadd_v8i8(<8 x i8> %a, <8 x i8> %b) {
%res = call <8 x i8> @llvm.aarch64.neon.urhadd.v8i8(<8 x i8> %a, <8 x i8> %b)
ret <8 x i8> %res
}
; CHECK-LABEL: test_uhadd_v16i8:
; CHECK: uhadd v0.16b, v0.16b, v1.16b
define <16 x i8> @test_uhadd_v16i8(<16 x i8> %a, <16 x i8> %b) {
%res = call <16 x i8> @llvm.aarch64.neon.uhadd.v16i8(<16 x i8> %a, <16 x i8> %b)
ret <16 x i8> %res
}
; CHECK-LABEL: test_urhadd_v16i8:
; CHECK: urhadd v0.16b, v0.16b, v1.16b
define <16 x i8> @test_urhadd_v16i8(<16 x i8> %a, <16 x i8> %b) {
%res = call <16 x i8> @llvm.aarch64.neon.urhadd.v16i8(<16 x i8> %a, <16 x i8> %b)
ret <16 x i8> %res
} Would this be acceptable for verifying that the uhadd/urhadd intrinsics get correctly lowered into the corresponding DAG avg patterns? |
What could be done is to fold based on the knownbits/signbits result of a avgu/avgs node - if the freeze is moved then the knownbits/signbits calculation is exposed and can allow the fold to occur. |
Hi @RKSimon here is the test could you please look in to it if i have not missed anything ; RUN: opt -passes=instcombine -S < %s | FileCheck %s
; === Signed Floor Average (AVGFLOORS) ===
define i8 @avg_signed_floor_freeze_before(i8 %a, i8 %b) {
; CHECK-LABEL: @avg_signed_floor_freeze_before(
; CHECK: freeze i8
entry:
%fa = freeze i8 %a
%fb = freeze i8 %b
%a32 = sext i8 %fa to i32
%b32 = sext i8 %fb to i32
%and = and i32 %a32, %b32
%xor = xor i32 %a32, %b32
%shr = ashr i32 %xor, 1
%sum = add i32 %and, %shr
%res = trunc i32 %sum to i8
ret i8 %res
}
define i8 @avg_signed_floor_freeze_after(i8 %a, i8 %b) {
; CHECK-LABEL: @avg_signed_floor_freeze_after(
; CHECK: freeze i8
entry:
%a32 = sext i8 %a to i32
%b32 = sext i8 %b to i32
%and = and i32 %a32, %b32
%xor = xor i32 %a32, %b32
%shr = ashr i32 %xor, 1
%sum = add i32 %and, %shr
%res = trunc i32 %sum to i8
%fr = freeze i8 %res
ret i8 %fr
}
; === Signed Ceil Average (AVGCEILS) ===
define i8 @avg_signed_ceil_freeze_before(i8 %a, i8 %b) {
; CHECK-LABEL: @avg_signed_ceil_freeze_before(
; CHECK: freeze i8
entry:
%fa = freeze i8 %a
%fb = freeze i8 %b
%a32 = sext i8 %fa to i32
%b32 = sext i8 %fb to i32
%add = add i32 %a32, %b32
%add1 = add i32 %add, 1
%shr = ashr i32 %add1, 1
%res = trunc i32 %shr to i8
ret i8 %res
}
define i8 @avg_signed_ceil_freeze_after(i8 %a, i8 %b) {
; CHECK-LABEL: @avg_signed_ceil_freeze_after(
; CHECK: freeze i8
entry:
%a32 = sext i8 %a to i32
%b32 = sext i8 %b to i32
%add = add i32 %a32, %b32
%add1 = add i32 %add, 1
%shr = ashr i32 %add1, 1
%res = trunc i32 %shr to i8
%fr = freeze i8 %res
ret i8 %fr
}
; === Unsigned Floor Average (AVGFLOORU) ===
define i8 @avg_unsigned_floor_freeze_before(i8 %a, i8 %b) {
; CHECK-LABEL: @avg_unsigned_floor_freeze_before(
; CHECK: freeze i8
entry:
%fa = freeze i8 %a
%fb = freeze i8 %b
%a32 = zext i8 %fa to i32
%b32 = zext i8 %fb to i32
%and = and i32 %a32, %b32
%xor = xor i32 %a32, %b32
%shr = lshr i32 %xor, 1
%sum = add i32 %and, %shr
%res = trunc i32 %sum to i8
ret i8 %res
}
define i8 @avg_unsigned_floor_freeze_after(i8 %a, i8 %b) {
; CHECK-LABEL: @avg_unsigned_floor_freeze_after(
; CHECK: freeze i8
entry:
%a32 = zext i8 %a to i32
%b32 = zext i8 %b to i32
%and = and i32 %a32, %b32
%xor = xor i32 %a32, %b32
%shr = lshr i32 %xor, 1
%sum = add i32 %and, %shr
%res = trunc i32 %sum to i8
%fr = freeze i8 %res
ret i8 %fr
}
; === Unsigned Ceil Average (AVGCEILU) ===
define i8 @avg_unsigned_ceil_freeze_before(i8 %a, i8 %b) {
; CHECK-LABEL: @avg_unsigned_ceil_freeze_before(
; CHECK: freeze i8
entry:
%fa = freeze i8 %a
%fb = freeze i8 %b
%a32 = zext i8 %fa to i32
%b32 = zext i8 %fb to i32
%add = add i32 %a32, %b32
%add1 = add i32 %add, 1
%shr = lshr i32 %add1, 1
%res = trunc i32 %shr to i8
ret i8 %res
}
define i8 @avg_unsigned_ceil_freeze_after(i8 %a, i8 %b) {
; CHECK-LABEL: @avg_unsigned_ceil_freeze_after(
; CHECK: freeze i8
entry:
%a32 = zext i8 %a to i32
%b32 = zext i8 %b to i32
%add = add i32 %a32, %b32
%add1 = add i32 %add, 1
%shr = lshr i32 %add1, 1
%res = trunc i32 %shr to i8
%fr = freeze i8 %res
ret i8 %fr
}
I’ve added
Could you please review if I’ve missed anything? |
%div3 = lshr i16 %add, 1 | ||
%conv2 = trunc nuw i16 %div3 to i8 | ||
ret i8 %conv2 | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem you have if you attempt to freeze the expanded avg pattern - is that the freeze might get folded through instructions BEFORE the avg node is created. You're going to have to use the aarch64 hadd/rhadd intrinsics and rely on a fold that uses knownbits/signbits that is blocked if the freeze it still present and(avg(and(x,const0),and(y,const1)),const2)
might work for avgceilu/avgflooru for instance.
llvm\test\CodeGen\X86\freeze-binary.ll has some examples of frozen calls of other intrinsics to give you some ideas that you could try for @llvm.aarch64.neon.shadd.v8i16 etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @RKSimon
I've now added a separate AArch64 test (avg-freeze.ll) that demonstrates freeze applied after the hadd/rhadd
intrinsics (shadd, srhadd, uhadd, urhadd)
. This avoids blocking instruction selection and ensures correctness. The tests include:
- freeze_shadd_vec
- freeze_srhadd_vec
- freeze_uhadd_vec
- freeze_urhadd_vec
The logic is similar to how freeze-preserving intrinsic selection is handled in
freeze-binary.ll
forX86
.
testcase
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
; RUN: llc < %s -mtriple=aarch64-- | FileCheck %s --check-prefixes=AARCH64
; This test verifies that applying `freeze` to the result of AArch64 hadd/rhadd intrinsics
; does not interfere with instruction selection.
; The intrinsic is computed first, then frozen. Since `freeze` is applied after,
; it does not block the selection of `shadd`, `srhadd`, `uhadd`, or `urhadd`.
declare <8 x i16> @llvm.aarch64.neon.shadd.v8i16(<8 x i16>, <8 x i16>)
declare <8 x i16> @llvm.aarch64.neon.srhadd.v8i16(<8 x i16>, <8 x i16>)
declare <8 x i16> @llvm.aarch64.neon.uhadd.v8i16(<8 x i16>, <8 x i16>)
declare <8 x i16> @llvm.aarch64.neon.urhadd.v8i16(<8 x i16>, <8 x i16>)
define <8 x i16> @freeze_shadd_vec(<8 x i16> %a, <8 x i16> %b) {
; AARCH64-LABEL: freeze_shadd_vec:
; AARCH64: shadd v0.8h, v0.8h, v1.8h
; AARCH64-NEXT: ret
%avg = call <8 x i16> @llvm.aarch64.neon.shadd.v8i16(<8 x i16> %a, <8 x i16> %b)
%frozen = freeze <8 x i16> %avg
ret <8 x i16> %frozen
}
define <8 x i16> @freeze_srhadd_vec(<8 x i16> %a, <8 x i16> %b) {
; AARCH64-LABEL: freeze_srhadd_vec:
; AARCH64: srhadd v0.8h, v0.8h, v1.8h
; AARCH64-NEXT: ret
%avg = call <8 x i16> @llvm.aarch64.neon.srhadd.v8i16(<8 x i16> %a, <8 x i16> %b)
%frozen = freeze <8 x i16> %avg
ret <8 x i16> %frozen
}
define <8 x i16> @freeze_uhadd_vec(<8 x i16> %a, <8 x i16> %b) {
; AARCH64-LABEL: freeze_uhadd_vec:
; AARCH64: uhadd v0.8h, v0.8h, v1.8h
; AARCH64-NEXT: ret
%avg = call <8 x i16> @llvm.aarch64.neon.uhadd.v8i16(<8 x i16> %a, <8 x i16> %b)
%frozen = freeze <8 x i16> %avg
ret <8 x i16> %frozen
}
define <8 x i16> @freeze_urhadd_vec(<8 x i16> %a, <8 x i16> %b) {
; AARCH64-LABEL: freeze_urhadd_vec:
; AARCH64: urhadd v0.8h, v0.8h, v1.8h
; AARCH64-NEXT: ret
%avg = call <8 x i16> @llvm.aarch64.neon.urhadd.v8i16(<8 x i16> %a, <8 x i16> %b)
%frozen = freeze <8 x i16> %avg
ret <8 x i16> %frozen
}
llvm-lit report
./llvm-lit ../../llvm/test/CodeGen/AArch64/avg-freeze.ll
-- Testing: 1 tests, 1 workers --
PASS: LLVM :: CodeGen/AArch64/avg-freeze.ll (1 of 1)
Testing Time: 0.16s
Total Discovered Tests: 1
Passed: 1 (100.00%)
Let me know if you'd like to see this added to a different file or want to keep a different name for the file coverage!
or i need to modify anything in the testcase
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I had in mind was something like this:
define <8 x i16> @src(<8 x i16> %a0, <8 x i16> %a1) {
%m0 = and <8 x i16> %a0, splat (i16 15)
%m1 = and <8 x i16> %a1, splat (i16 15)
%avg = call <8 x i16> @llvm.aarch64.neon.uhadd.v8i16(<8 x i16> %m0, <8 x i16> %m1)
%frozen = freeze <8 x i16> %avg
%mask = and <8 x i16> %frozen, splat (i16 31)
ret <8 x i16> %mask
}
define <8 x i16> @tgt(<8 x i16> %a0, <8 x i16> %a1) {
%f0 = freeze <8 x i16> %a0
%f1 = freeze <8 x i16> %a1
%m0 = and <8 x i16> %f0, splat (i16 15)
%m1 = and <8 x i16> %f1, splat (i16 15)
%avg = call <8 x i16> @llvm.aarch64.neon.uhadd.v8i16(<8 x i16> %m0, <8 x i16> %m1)
%mask = and <8 x i16> %avg, splat (i16 31)
ret <8 x i16> %mask
}
But this doesn't work yet due to a missing support for AArch64ISD::MOVIshift in AArch64TargetLowering::computeKnownBitsForTargetNode
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, the src/tgt version highlights a more interesting optimization around freeze placement and known bits. As you mentioned, it doesn't work yet because support for AArch64ISD::MOVIshift
is missing in computeKnownBitsForTargetNode
.
What you said makes sense we can look into adding that support in a follow-up patch. Once it's there, we can add the more advanced test to properly check how known bits interact with freeze.
For this patch, I suggest we go ahead with the current tests. They still serve the main purpose, which is to make sure that freeze after hadd/rhadd doesn't interfere with instruction selection.
Let me know if you're good with that. If yes, we can land this one and open a new patch for the MOVIshift support and the src/tgt test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @RKSimon Is this test structure OK?
patterns into this unified hadd-freeze.ll after #148634 landed.
; RUN: llc < %s -march=arm64 -mattr=+neon | FileCheck %s
declare <8 x i16> @llvm.aarch64.neon.uhadd.v8i16(<8 x i16>, <8 x i16>)
declare <8 x i16> @llvm.aarch64.neon.urhadd.v8i16(<8 x i16>, <8 x i16>)
declare <8 x i16> @llvm.aarch64.neon.shadd.v8i16(<8 x i16>, <8 x i16>)
declare <8 x i16> @llvm.aarch64.neon.srhadd.v8i16(<8 x i16>, <8 x i16>)
;===---------------------------------------------------------------------===;
; Test: freeze does not block uhadd instruction selection
;===---------------------------------------------------------------------===;
define <8 x i16> @uhadd_freeze(<8 x i16> %a0, <8 x i16> %a1) {
; CHECK-LABEL: uhadd_freeze:
; CHECK: uhadd
; CHECK: and
%m0 = and <8 x i16> %a0, <i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15>
%m1 = and <8 x i16> %a1, <i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15>
%avg = call <8 x i16> @llvm.aarch64.neon.uhadd.v8i16(<8 x i16> %m0, <8 x i16> %m1)
%frozen = freeze <8 x i16> %avg
%masked = and <8 x i16> %frozen, <i16 31, i16 31, i16 31, i16 31, i16 31, i16 31, i16 31, i16 31>
ret <8 x i16> %masked
}
;===---------------------------------------------------------------------===;
; Test: freeze does not block urhadd instruction selection
;===---------------------------------------------------------------------===;
define <8 x i16> @urhadd_freeze(<8 x i16> %a0, <8 x i16> %a1) {
; CHECK-LABEL: urhadd_freeze:
; CHECK: urhadd
; CHECK: and
%m0 = and <8 x i16> %a0, <i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15>
%m1 = and <8 x i16> %a1, <i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15>
%avg = call <8 x i16> @llvm.aarch64.neon.urhadd.v8i16(<8 x i16> %m0, <8 x i16> %m1)
%frozen = freeze <8 x i16> %avg
%masked = and <8 x i16> %frozen, <i16 31, i16 31, i16 31, i16 31, i16 31, i16 31, i16 31, i16 31>
ret <8 x i16> %masked
}
;===---------------------------------------------------------------------===;
; Test: freeze does not block shadd instruction selection
;===---------------------------------------------------------------------===;
define <8 x i16> @shadd_freeze(<8 x i16> %a0, <8 x i16> %a1) {
; CHECK-LABEL: shadd_freeze:
; CHECK: shadd
; CHECK: and
%m0 = and <8 x i16> %a0, <i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15>
%m1 = and <8 x i16> %a1, <i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15>
%avg = call <8 x i16> @llvm.aarch64.neon.shadd.v8i16(<8 x i16> %m0, <8 x i16> %m1)
%frozen = freeze <8 x i16> %avg
%masked = and <8 x i16> %frozen, <i16 31, i16 31, i16 31, i16 31, i16 31, i16 31, i16 31, i16 31>
ret <8 x i16> %masked
}
;===---------------------------------------------------------------------===;
; Test: freeze does not block srhadd instruction selection
;===---------------------------------------------------------------------===;
define <8 x i16> @srhadd_freeze(<8 x i16> %a0, <8 x i16> %a1) {
; CHECK-LABEL: srhadd_freeze:
; CHECK: srhadd
; CHECK: and
%m0 = and <8 x i16> %a0, <i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15>
%m1 = and <8 x i16> %a1, <i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15>
%avg = call <8 x i16> @llvm.aarch64.neon.srhadd.v8i16(<8 x i16> %m0, <8 x i16> %m1)
%frozen = freeze <8 x i16> %avg
%masked = and <8 x i16> %frozen, <i16 31, i16 31, i16 31, i16 31, i16 31, i16 31, i16 31, i16 31>
ret <8 x i16> %masked
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, a hadd-freeze.ll file should be fine - please get the patch updated to allow us to review the diffs. I'm expecting you will need to come up with a different approach for signed hadd tests as otherwise they will just fold to unsigned variants if they are known poisitive.
3e3632e
to
6999bf3
Compare
declare <8 x i16> @llvm.aarch64.neon.shadd.v8i16(<8 x i16>, <8 x i16>) | ||
declare <8 x i16> @llvm.aarch64.neon.srhadd.v8i16(<8 x i16>, <8 x i16>) | ||
|
||
;===---------------------------------------------------------------------===; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove these ;===-----
's. They don't add much.
…ILS as safe This patch updates `SelectionDAG::canCreateUndefOrPoison` to indicate that `ISD::AVGFLOORS` and `ISD::AVGCEILS` do not introduce poison or undef values. | Opcode | Operation Type | Alive2 Proof Link | |--------------|-------------------|--------------------| | `AVGFLOORS` | Signed Floor Avg | [Alive2 Link](https://alive2.llvm.org/ce/z/Dwy8a5) | | `AVGCEILS` | Signed Ceil Avg | [Alive2 Link](https://alive2.llvm.org/ce/z/_JKF8A) | | `AVGFLOORU` | Unsigned Floor Avg| [Alive2 Link](https://alive2.llvm.org/ce/z/2-B6RM) | | `AVGCEILU` | Unsigned Ceil Avg | [Alive2 Link](https://alive2.llvm.org/ce/z/t5WZ6K) | These patterns are safe due to the use of `sext i8` into `i32`, which ensures no signed overflow occurs. The arithmetic is done in the wider domain before truncating safely back to `i8`. Includes test coverage to ensure correctness.
6999bf3
to
ae97069
Compare
; CHECK-NEXT: and v1.16b, v1.16b, v2.16b | ||
; CHECK-NEXT: movi v2.8h, #31 | ||
; CHECK-NEXT: uhadd v0.8h, v0.8h, v1.8h | ||
; CHECK-NEXT: and v0.16b, v0.16b, v2.16b |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This and should have gone - any idea why it hasn't?
; CHECK-NEXT: and v1.16b, v1.16b, v2.16b | ||
; CHECK-NEXT: movi v2.8h, #31 | ||
; CHECK-NEXT: urhadd v0.8h, v0.8h, v1.8h | ||
; CHECK-NEXT: and v0.16b, v0.16b, v2.16b |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This and should have gone - any idea why it hasn't?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LLVM must be conservative around freeze to avoid miscompilation. Instruction selection may proceed, but value range assumptions cannot be made post-freeze.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it help if you apply #149323?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I tested with PR [#149323]
After rebuilding and rerunning llc
on the neon-hadd-freeze.ll
test, the instruction:
and v1.16b, v1.16b, v2.16b
is still present in the output for both uhadd_freeze
and urhadd_freeze
.
So it seems the patch does not remove the redundant and
as expected.
…des can't create undef/poison Possible fix for failed fold in llvm#148191
This patch updates
SelectionDAG::canCreateUndefOrPoison
to indicate thatISD::AVGFLOORS
andISD::AVGCEILS
do not introduce poison or undef values.Alive2 Proofs for AVG Operations
AVGFLOORS
AVGCEILS
AVGFLOORU
AVGCEILU
These patterns are safe due to the use of
sext i8
intoi32
, which ensures no signed overflow occurs. The arithmetic is done in the wider domain before truncating safely back toi8
.Includes test coverage to ensure correctness.
Fixes: #147696